| Cluster | Required Age | Price ($) | DLC Count | Metacritic Score | Achievements | Recommendations | Median Playtime Forever (min) | Peak CCU | Player Rating Ratio |
|---|---|---|---|---|---|---|---|---|---|
| 1 | -0.141 | -0.357 | 0.201 | -0.239 | 23.643 | -0.037 | -0.009 | -0.025 | -0.414 |
| 2 | -0.140 | -0.012 | -0.014 | -0.239 | -0.036 | -0.038 | -0.018 | -0.014 | 0.478 |
| 3 | 6.931 | 0.922 | 0.129 | 1.387 | 0.026 | 1.195 | 0.163 | 0.677 | -0.087 |
| 4 | -0.140 | -0.221 | -0.019 | -0.235 | -0.062 | -0.058 | -0.043 | -0.024 | -1.486 |
| 5 | -0.141 | 0.845 | 0.238 | 4.017 | 0.044 | 0.341 | 0.404 | 0.039 | 0.151 |
Grouping Games Through Clustering
Technique
To address the question about which video games are similar (and how), I wanted to use unsupervised learning techniques. I will use k-means clustering as a way to assign video games to distinct groups, which may uncover similarities between the games.
Variables
For this analysis, I chose 9 quantitative variables to cluster the games by as follows:
required_age- The age required to to buy and play the game.price- The price of the game.dlc_count- The count of how much Downloadable Content (DLC) the game has. This is additional game content that can be purchased.metacritic_score- The score given by Metacritic, a platform that rates games, movies, TV shows, and albums. This is a numeric score out of 100 based off of the weighted average of highly respected critics’ reviews.achievements- The number of achievements in the game.recommendations- The number of recommendations that the game received.median_playtime_forever- The median playtime in minutes since the game’s release.peak_ccu- The highest number of concurrent users (CCU) of the game in a given period.player_rating_ratio- A self-created statistic that describes how players think of the game by dividing the net positive review count (positive - negative) by total number of reviews. The value ranges from -1 to 1, where -1 means that all reviews were negative (hated), 0 means that there were an equal number of positive and negative reviews (neutral), and 1 means that all reviews were positive (loved).
Since the variables on different scales (e.g. 0 to 100, -1 to 1, and 0 to 1,000,000+), it is important to standardize the variables (i.e. level the playing field). This way, the locations of the centroids will not be dominated by its relation to one variable over another when using k-means clustering.
Elbow Plot
Since we are working with 9 quantitative variables, to determine the optimal number of clusters, I first consulted an elbow plot. We want to look at when the within-cluster variance is relatively small, without creating too many clusters. However, with too many clusters, it may become too difficult to interpret the results or for the results to hold any significant meaning. Therefore, based on the elbow plot (see Figure 1), 5 clusters appear to be reasonably optimal as it is where the graph “bends” without containing too many clusters.
Clusters
After performing a k-means clustering with k = 5 clusters, it appears that our clusters are grouped by the following characteristics (see Table 1):
Cluster 1: Higher DLC counts, extremely high number of achievements, and lower player ratings.
Cluster 2: Generally average values across the variables except for lower Metacritic scores and high player ratings.
Cluster 3: Highest age requirements, highest prices, higher Metacritic scores, high number of recommendations, and the highest CCU counts.
Cluster 4: Lowest median playtimes and lowest player ratings.
Cluster 5: Higher prices, highest DLC counts, highest Metacritic scores, higher number of recommendations, and highest median playtimes.
Therefore, Cluster 1 contains games that may cater towards completionists, although lacking in positive ratings. Cluster 2 contains “average” games that are not approved by Metacritics but are loved by players. Cluster 3 contains expensive games that are catered towards older audiences and are widely recommended and played. Cluster 4 contains games that are subjectively terrible; players do not play these games for long and give bad ratings. Finally, Cluster 5 contains expensive games with a lot of content for committed players; these games are widely shared and receive high Metacritic scores.
Let’s see how well the clustering grouped the games! To do so, I will take a random sample (Wickham et al. 2023) of games within each cluster.
| Name | Cluster | Release Date | Required Age | Price ($) | DLC Count | Metacritic Score | Achievements | Recommendations | Median Playtime Forever (min) | Peak CCU | Player Rating Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Achievement Idler: Black | 1 | Apr 13, 2018 | 0 | 0.99 | 0 | 0 | 5000 | 278 | 0 | 1 | 0.681 |
| HUBE: Seeker of Achievements | 1 | May 16, 2018 | 0 | 0.99 | 0 | 0 | 4805 | 0 | 17 | 0 | 0.370 |
| Trivia Vault: Video Game Trivia Deluxe | 1 | Oct 24, 2017 | 0 | 14.99 | 0 | 0 | 5000 | 0 | 0 | 0 | 0.013 |
| Professor Watts Memory Match: Shapes And Colors | 1 | Jul 27, 2017 | 0 | 4.99 | 0 | 0 | 5000 | 0 | 0 | 0 | 0.042 |
| Trivia Vault: Classic Rock Trivia 2 | 1 | Aug 23, 2017 | 0 | 9.99 | 0 | 0 | 5000 | 0 | 0 | 0 | 0.081 |
| AZURA | 1 | Nov 13, 2017 | 0 | 1.99 | 0 | 0 | 4732 | 208 | 0 | 0 | 0.270 |
From Table 2, we can see that Achievement Idler: Black, HUBE: Seeker of Achievements, and Trivia Vault: Video Game Trivia Deluxe are similar as they have high achievement counts.
| Name | Cluster | Release Date | Required Age | Price ($) | DLC Count | Metacritic Score | Achievements | Recommendations | Median Playtime Forever (min) | Peak CCU | Player Rating Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Smart Game Booster | 2 | May 11, 2020 | 0 | 0.00 | 1 | 0 | 0 | 0 | 998 | 611 | 0.487 |
| FINAL FANTASY IV | 2 | Sep 8, 2021 | 0 | 17.99 | 0 | 0 | 30 | 1263 | 885 | 70 | 0.767 |
| Cowgirl Adventures | 2 | Jul 29, 2022 | 0 | 1.39 | 0 | 0 | 0 | 0 | 0 | 0 | 0.333 |
| Sunset Humanity | 2 | Oct 22, 2024 | 0 | 3.99 | 0 | 0 | 5 | 0 | 0 | 0 | 1.000 |
| All Fall Down | 2 | Oct 16, 2015 | 0 | 0.99 | 0 | 0 | 8 | 0 | 296 | 0 | 0.333 |
| The fairy tale you don't know | 2 | Jan 22, 2021 | 0 | 18.99 | 0 | 0 | 0 | 138 | 0 | 1 | 0.592 |
From Table 3, we can see that Smart Game Booster, FINAL FANTASY IV, and Cowgirl Adventures are similar as they have low Metacritic scores of 0 and positive player ratings.
| Name | Cluster | Release Date | Required Age | Price ($) | DLC Count | Metacritic Score | Achievements | Recommendations | Median Playtime Forever (min) | Peak CCU | Player Rating Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|
| The Jungle | 3 | Sep 1, 2020 | 16 | 19.99 | 0 | 0 | 0 | 0 | 0 | 0 | -0.556 |
| Meet Your Maker | 3 | Apr 4, 2023 | 17 | 29.99 | 1 | 0 | 43 | 1362 | 493 | 2715 | 0.651 |
| Hacker.exe | 3 | Sep 14, 2018 | 13 | 8.99 | 0 | 0 | 12 | 0 | 0 | 0 | -0.478 |
| Phantasmal: Survival Horror Roguelike | 3 | Apr 14, 2016 | 13 | 0.00 | 0 | 0 | 55 | 124 | 1 | 1 | -0.091 |
| DRAGON QUEST HEROES™ Slime Edition | 3 | Dec 3, 2015 | 13 | 39.99 | 0 | 0 | 50 | 1074 | 225 | 5 | 0.355 |
| Psihanul | 3 | Nov 7, 2019 | 16 | 2.99 | 0 | 0 | 9 | 0 | 0 | 0 | -1.000 |
From Table 4, we can see that The Jungle, Meet Your Maker, and Hacker.exe are similar as they have high required ages (13+) and are not free.
| Name | Cluster | Release Date | Required Age | Price ($) | DLC Count | Metacritic Score | Achievements | Recommendations | Median Playtime Forever (min) | Peak CCU | Player Rating Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|
| KING HAJWALA | 4 | Jul 29, 2021 | 0 | 6.99 | 0 | 0 | 0 | 0 | 0 | 0 | 0.200 |
| Stakeholder Game | 4 | Feb 10, 2022 | 0 | 9.99 | 0 | 0 | 0 | 0 | 0 | 0 | -0.455 |
| Ritter | 4 | Aug 12, 2018 | 0 | 0.99 | 0 | 0 | 0 | 0 | 0 | 0 | -0.500 |
| Gevaudan | 4 | Jun 8, 2017 | 0 | 2.99 | 0 | 0 | 0 | 0 | 8 | 0 | 0.231 |
| K.O.M.A | 4 | Mar 14, 2019 | 0 | 1.99 | 0 | 0 | 8 | 0 | 0 | 0 | 0.200 |
| Wondership Q | 4 | Jul 18, 2016 | 0 | 9.99 | 0 | 0 | 11 | 0 | 0 | 0 | -0.143 |
From Table 5, we can see that KING HAJWALA, Stakeholder Game, and Ritter are similar as they have low median playtimes (0 minutes!).
| Name | Cluster | Release Date | Required Age | Price ($) | DLC Count | Metacritic Score | Achievements | Recommendations | Median Playtime Forever (min) | Peak CCU | Player Rating Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Guild Wars: Eye of the North® | 5 | Oct 8, 2010 | 0 | 19.99 | 0 | 79 | 0 | 0 | 0 | 0 | 1.000 |
| Primal Carnage | 5 | Oct 29, 2012 | 0 | 4.99 | 9 | 67 | 69 | 4828 | 238 | 6 | 0.536 |
| Citizen Sleeper 2: Starward Vector | 5 | Jan 31, 2025 | 0 | 22.49 | 2 | 86 | 30 | 602 | 0 | 404 | 0.867 |
| True Love | 5 | Jun 27, 2024 | 0 | 500.00 | 0 | 0 | 0 | 0 | 0 | 0 | 1.000 |
| Armada 2526 Gold Edition | 5 | Feb 28, 2013 | 0 | 3.59 | 0 | 66 | 0 | 0 | 616 | 2 | 0.299 |
| Curse: The Eye of Isis | 5 | Aug 22, 2014 | 0 | 0.59 | 0 | 63 | 0 | 108 | 32 | 0 | 0.468 |
From Table 6, we can see that Guild Wars: Eye of the North®, Primal Carnage, and Citizen Sleeper 2: Starward Vector are similar as they have higher Metacritic Scores and are not free.
Conclusion
From Tables 2-6, we can see that even games within the same cluster are not completely alike. This is logical because the elbow plot (see Figure 1) showcased a lot of within-cluster variability for 5 clusters. One way to decrease the variability is to find more variables that could address this question of “which games are similar?”. On the other hand, perhaps one of the 9 variables used in this analysis was negatively impacting the within-cluster variability. In future analyses, I will look deeper into the quantitative variables by examining their distributions to check for discrepancies in the data.
Although clustering did not perfectly group games from the 9 quantitative variables, we could still see some similarities based on game price, required age, and player reviews! Games target different players, so this analysis succeeded in some way.
Bonus
If you have a favorite game on Steam, use the following table (see Table 7) created with the DT package (Xie et al. 2024) to see which cluster it is in! Does its classification make sense?